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1 . INTRODUCTION 


'A 

When  a random  signal  is  sampled  and  digitally  analyzed, 
the  results  of  the  analysis  are  estimates  of  the  true  parameters 
or  statistics  describing  the  random  signal.  In  many  cases  there 
exists  difficulty  in  assigning  a level  of  confidence  to  such  an 
estimate.  That  is,  the  estimate  can  never  be  said  to  equal  the 
true  parameter  value,  but  can  be  said  to  lie  within  specified 
confidence  limits,  or  error  bars,  with  a given  probability.  The 
confidence  limits  are  highly  dependent  upon  the  characteristics 
of  the  particular  random  signal  being  analyzed  and  the  parameter 
being  estimated.  — - 

The  most  important  single  characteristic  is  the  number 
of  independent  samples  available  from  the  random  signal.  For  a 
random  signal  which  contains  both  amplitude  and  phase  information, 
the  number  of  independent  samples  available  is 


N, 


2 BT  (amplitude  and  phase  samples) 


where 


Nj  - is  the  number  of  independent  samples; 

B - is  the  bandwidth  of  the  random  signal;  and 

T - is  the  signal  duration,  (the  length  of  the 
sampled  function) . 

In  a random  process  which  contains  only  amplitude  information, 
such  as  at  the  output  of  a detector- averager , half  the  independent 
samples  have  been  destroyed, 


N, 


BT  (amplitude  or  phase  samples) . 
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This  memorandum  contains  the  necessary  procedures  for 
determining  confidence  intervals  for  several  types  of  estimates 
that  occur  frequently  in  sonar  signal  analysis.  Covered  in 
Section  2 are  confidence  intervals  for  the  sample  mean  and  sample 
variance  from  a normal  population.  Section  3 contains  a descrip- 
tion of  the  chi-square  goodness-of-fit  test,  used  to  test  the 
equivalence  of  a measured  probability  density  function  for  sampled 
data  to  some  hypothesized  density  function.  Section  4 describes  the 
Kolmogorov  (K)  statistic,  used  to  set  a confidence  band  about  an 
entire  probability  distribution  function;  and  a binomial  statistic, 
used  to  set  confidence  limits  about  each  point  on  an  empirical 
distribution  function.  References  are  listed  in  Section  5. 


2.  PARAMETER  ESTIMATION 
2.1  SAMPLE  MEAN 

The  sample  mean  of  a random  signal  is  defined  as 


x - is  the  sample  mean; 

N - is  the  number  of  samples  analyzed;  and 
x^  - is  the  value  of  the  i^  sample. 

Note  that  the  x^  are  random  variables  described  by  some  type  of 
statistics.  The  sample  mean  is  formed  by  a linear  combination  of 
the  x^,  thus  x is  also  a random  variable  and  may  only  be  considered 
as  an  estimate  of  the  true  mean,  u ; that  is, 


X - Mx, 

where  the  "a"  denotes  "estimate." 

A confidence  interval  for  the  mean  value  estimate  is  then 
some  interval  about  the  sample  mean  within  which  the  true  mean 
value  lies  with  a given  probability,  on  the  average.  For  the 
p>-  poses  of  Section  2 of  this  memorandum,  the  x^  will  be  assumed 
to  be  samples  of  a normal  random  variable.  A confidence  interval 
for  the  sample  mean  of  a normal  population  is  then  [1,  page  140] 
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(1-a)  = ProbAx-stn  ;a/2V 
1 


Ux  5 f 


+ Stn  itt/2 If’ 

/TTj  ') 


where 

(1-a) 


s 


n 


Cn;  a/2 


is  the  probability  that  u will  lie  within 
the  confidence  interval; 

is  the  unbiased  sample  standard  deviation 
defined  in  Section  2.1; 

is  the  number  of  independent  samples; 

is  the  number  of  degrees  of  freedom,  (n=Nj-l 
for  the  sample  mean) ; and 

is  the  value  of  the  variable,  tn,  in  Student's 
t distribution  with  n degrees  of  freedom,  such 
that  Prob(tn  > cn.a/2^  = a^*  Student  t 

density  is  symmetric  about  t = 0.) 


a/2 


Tables  of  the  Student  t distribution  may  be  used  to 
calculate  this  interval,  or  the  curves  in  Fig.  1 may  be  used. 

These  curves  are  plots  of  — £-»  --/2  versus  n for  different  values  of 

a.  It  should  be  emphasized  at  this  point  that  the  number  of 
independent  samples  Nj,  is  necessary  for  this  calculation  no  matter 
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how  great  the  sampling  rate.  The  curves  in  Fig.  1 arise  from  the 
assumption  that  the  samples  x^  are  from  a normal  population, 
however,  for  Nj.  > 10  the  sampling  distribution  for  x approaches  a 
normal  distribution  [1,  page  136], 

2.2  SAMPLE  VARIANCE 

The  sample  variance  of  a random  signal  is  defined  as 


-((tpr)  v (xi  - x)' 


where  2 

- is  the  sample  variance  (unbiased  estimate); 

Nj  - is  the  number  of  independent  samples; 

N - is  the  number  of  samples  analyzed; 

* is  the  value  of  the  iC^  sample;  and 
9 

ox  - is  the  true  variance. 

This  definition  of  the  sample  variance  arises  from  the  unbiased 
estimate  in  which  only  independent  samples  are  taken  [lpages  125,126], 
Again,  assuming  that  the  x^  are  samples  from  a normal 
population,  a confidence  interval  for  the  sample  variance  may  be 
defined  as  [1  page  140], 


(1-a)  = Prob 


ns 
l * n;i 


*n;l-a/2 


: 


r 


Once  again  it  is  emphasized  that  the  above  confidence 
interval  depends  upon  the  number  of  degrees  of  freedom,  n,  no 
matter  how  great  the  sampling  rate.  A high  sampling  rate  allows 
easier  and  more  detailed  practical  reconstruction  of  the  original 
analog  waveform  but  does  not  effect  theoretically  the  confidence 
limits . 
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Fig.  2 NORMAL I ZEO  LONER  CONFIDENCE  LIMITS  FOR  GAUSSIAN  SAMPLE  VARIANCE. 


Fig.  3 NORMALIZED  UPPER  CONFIDENCE  LIMITS  FOR  GAUSS I AN  SAMPLE  VARIANCE 
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TABLE  I NORMALIZED  UPPER  CONFIDENCE  LIMITS 

FOR  GAUSSIAN  SAMPLE  VARIANCE,  (1  s n*5) . 


Degrees  of 
Freedom,  n 

n 

n 

n 

-J. 

*n;0.90 

V.0.95 

v2 

xn;0.975 

1 

63.82 

257.21 

1018.26 

2 

9.49 

19.50 

39.50 

3 

5.13 

8.53 

13.90 

4 

3.76 

5.63 

8.26 

5 

3.11 

4.36 

6.02 

EXAMPLE 

Consider  a random  time  function  described  by  Gaussian 
statistics  which  has  a duration  of  one  second  with  a known  band- 
width of  100  Hz.  The  sample  mean  is  found  to  be  1.0,  and  the 
sample  variance  is  found  to  be  4.0,  that  is 

x * 1.0  and 
s^  = 4.0. 

The  number  of  independent  samples,  Nj  is 

Nj  = 2 BT 

- 2(100) (1) 

Nj  = 200  independent  samples. 

Determine  a 90%  confidence  interval  for  the  true  mean  value  ux 

2 

and  the  true  variance 
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For  the  mean  value  confidence  interval,  we  have  the 
relationship  on  page  4,  which  gives 

c n 

0.9  = Prob  1.0-2.0(^199;  0.45)  s ux  < 1 .0-2 ,o(t199;  0.45)f  , 

^200  s/200  J 

Thus,  given  n = N^-l  = 199  degrees  of  freedom,  the  appropriate 
Cn;  a/2  value  comes  from  the  ordinate  of  the  90%  curve  at 

A 

abscissa  n = 199.  This  value  is  approximately  0.117.  Thus  a 
90%  confidence  statement  about  the  true  mean  value  is 

0.9  = Prob ^1.0- (2 .0) (0 .117)  s 1.0  + (2.0) (0.117)^ 

0.9  = Prob^0.766  s s 1.234 

That  is,  there  is  90%  confidence  that  the  true  mean  u will  lie 
between  0.766  and  1.234  when  = 1.0,  s^  = 4.0,  and  = 200. 

For  the  variance  confidence  interval,  we  have  the 
relationship  on  page  6,  which  gives 


0.9  = Prob 


199;0.05 


<*•<»-’  4^1 


199;  0.95 


Thus  for  n = 199  degrees  of  freedom  and  a 907.  confidence  interval. 


Figure  2 gives  199 


as  approximately  0.86,  and  Figure  3 


gives  199 


199;  0.05 

as  approximately  1.2.  Thus  a 90%  confidence 


199;  0.95 
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3.  SAMPLED  DATA  PROBABILITY  DENSITY  FUNCTION 


This  section  discusses  a method  by  which  one  may  test 
the  equivalence  of  a measured  probability  density  function  for 
sampled  data  to  some  hypothesized  probability  density  function. 

The  method  is  known  as  the  chi-square  goodness-of-f it  test  [1 
pages  146,  147].  Once  a hypothetical  density  function  is  chosen, 
this  test  can  be  used  to  determine  whether  the  overall  measured 
sampled-data  density  function  approaches  the  hypothetical  density. 

The  general  procedure  of  the  chi-square  goodness-of-fit 
test  involves  the  use  of  a statistic  with  an  approximate  chi- 
square  distribution  as  a measure  of  discrepency  between  the 
observed  and  hypothetical  densities.  Consider  a series  of 
samples  {x^},i  = 1,  N taken  from  a random  function,  in  which  there 
are  Nj  independent  samples.  Group  the  samples  into  K bins  to  form 
a frequency  histogram  or  discrete  observed  density  function. 

Denote  the  number  of  independent  samples  falling  within 
the  i^  bin  as  f^.  Calculate  the  number  of  independent  samples 
expected  to  fall  within  the  i^  bin  if  indeed  the  hypothetical 
density  were  correct,  and  denote  this  number  by  F^,  given  by 

Xi+1 

Fi  “ NI  J Ph(X)  dX> 

h 


where 


Nj  - is  the  number  of  independent  samples 

from  the  random  function; 


Ph(U 


is  the  hypothetical  probability  density 
function;  and 

is  the  interval  describing  the  ic^  bin. 
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The  discrepancy  between  observed  frequency  and  expected  frequency 

for  the  bin  is  then  (f^  - F^) , which  may  be  used  to  find  a 

2 

sample  statistic  X , given  by 

2 K (f,  - F.)2 
- £ 1 — 


i-1  Fi 
2 

The  distribution  for  X is  approximately  the  same  as  a 
o 

chi-square  (x*  ) distribution,  where  n,  the  number  of  degrees  of 
freedom  is  equal  to  K minus  the  number  of  independent  restrictions 
on  the  observations.  In  general,  [2,  page  177], 


n = K-l-b  , 

where  b is  the  number  of  parameters  in  the  population  description 
determined  from  the  random  sample.  The  normal  distribution  is 
completely  characterized  by  the  mean  and  standard  deviation,  so 

n ■ K-3  (Normal  Distribution). 

For  Rayleigh  statistics  only  one  parameter  is  necessary, 
so 

n = K-2  (Rayleigh  Distribution). 

2 

Once  the  quantity  X has  been  computed  it  may  be  tested 
for  goodness-of-fit.  The  region  of  acceptance  is  such  that 

y2  * 2 
X *xn;a  • 
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That  is,  if  the  value  of  is  less  than  or  equal  to  the  appropriate 
chi-square  variate,  then  the  probability  is  at  least  (1-a)  that  the 
samples  under  consideration  are  described  by  the  assumed  theoretical 
density.  Or,  conversely  the  probability  of  a sample  deviating 
from  the  assumed  density  is  a. 

In  order  to  apply  the  chi-square  goodness-of-fit  test, 
the  number  of  bins,  K must  be  chosen  with  care  [1  page  147]. 

Table  II  describes  the  minimum  number  of  bins  for  N independent 
samples  and  a = 0.05. 


TABLE  II  MINIMUM  OPTIMUM  NUMBER  OF  BINS  (K)  FOR  Nj 
INDEPENDENT  SAMPLES  AND  a = 0.05. 


NI 

200 

400 

600 

800 

1000 

1500 

2000 

K 

16 

20 

24 

27 

30 

35 

39 

In  addition  to  the  stipulation  in  Table  II,  the  chi- 
square  goodness-of-fit  test  works  best  when  N^  is  large  enough  to 
assure  that  at  least  10  samples  fall  into  each  bin,  especially 
near  the  tails  of  the  population  density. 

The  following  procedure  is  a summary  of  the  chi-square 
goodness-of-fit  test  for  the  case  when  the  population  distribution 
is  not  known  and  the  number  of  independent  samples  and  bins  have 
been  chosen  correctly. 
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1.  Hypothesis  H:  The  data  (x^)  are  a sample  of  a 
random  variable  with  density  p^(X.) • 

2.  Purpose  of  test:  To  determine  whether  the  data 
{x^J  may  be  considered  as  consistent  with  hypothesis  H. 

3.  Steps  in  test:  0 

' - Ft)2 

F 


2 K (fi  - Fi) 

(a)  Form  the  statistic  X = I p . 

i=l  Fi 

Determine  the  number  of  degrees  of  freedom  n. 
Select  a level  of  significance  a = 0.01,  0.05, 


(b) 

(c) 

(d) 


From  tables  of  the  x distribution  determine 

n 


xj.a  such  that  Prob(X  * * X„;  a)  = a. 

* 2 

(e)  If  the  test  statistic  X is  greater  than 

2 

X*.a>  then  the  hypothesis  H is  rejected  at 
the  a level  of  significance. 

(f)  If  the  test  statistic  X^  is  such  that 

2 

X * ^n*a*  b*1611  the  sample  function  may  be 
considered  as  consistent  with  hypothesis  H. 

It  should  be  observed  at  this  point  that  for  n > 30, 

2 

the  xn  distribution  may  be  approximated  by  a normal  distribution 
with  mean  n and  standard  deviation  /2n.  Also,  the  variable 

/ 2X^n  is  approximately  normally  distribution  with  mean  /^n- 1 and 
unit  standard  deviation. 
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4.  SAMPLED  DATA  PROBABILITY  DISTRIBUTION  FUNCTION 
4.1  KOLMOGOROV  STATISTIC 

Often  it  is  necessary  or  more  convenient  to  obtain  a 
probability  distribution  function  estimate  for  sampled  random 
data.  The  method  of  obtaining  a confidence  level  described  in 
this  section  allows  a confidence  band  to  be  placed  on  such  a 
distribution  estimate.  The  method,  known  as  the  Kolmogorov  (K) 
statistic  [3  page  452],  is  an  alternative  to  the  chi-square  test 
described  in  Section  3.  The  chi-square  goodness-of-fit  test 
gives  a general  idea  about  the  variation  of  a sampled-data 
density  function  varies  about  a hypothetical  density.  The  K 
statistic  allows  one  to  determine  a confidence  band  about  the 
distribution  function  estimate  so  that  a confidence  statement  can 
be  made  associated  with  a given  probability. 

In  order  to  use  the  K statistic,  tha  actual  distribution 
function  describing  the  observed  random  process  must  be  continuous. 
Actual  expressions  for  the  K statistic  are  quite  complicated, 
however,  asymptotic  approximations  are  available  and  are  listed 
in  Table  III.  The  approximations  are  conservative  for  all  values 
of  Nj  and  satisfactory  for  Nj  a 80.  The  approximations  are 
asymptotic  in  that  they  approach  asymptotically  the  true  K statistic 
as  Nj  increases. 

A 

Let  F(x)  be  the  estimate  of  the  true  probability  distri- 
bution function  F(x) . Then  F(x)  may  be  assigned  the  confidence 
interval 
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Prob  ^(f(x)  - d$j  s F(x)  s (f(x) 


for  all  x,  where  g is  the  desired  confidence  level  and  (±d£)  is 

A 

the  interval  about  F(x)  within  which  F(x)  will  fall  100  of  the 
time,  on  the  average.  For  example  using  Table  III,  if  N^  = 100, 
then  for  any  x,  F(x)  will  be  between  (F(x)  - 0.13581)  and 
(F(x)  + 0.13581)  95%  of  the  time,  on  the  average. 


TABLE  III 

KOLMOGOROV  CONFIDENCE  INTERVAL 

Confidence  Level,  g Confidence  Interval,  d3 

1.3581 
VNj  “ 

1.6276 

4.2  BINOMIAL  STATISTIC 

The  K statistic  just  described  gives  a uniform  bound  on 
the  probability  distribution  function  which  may  make  the  confidence 
interval  somewhat  wide  in  some  cases.  An  alternative  is  to  make 
the  confidence  limits  dependent  upon  the  observed  distribution. 

This  may  be  accomplished  by  setting  a threshold  for  the  input  random 
signal  and  counting  the  number  of  samples  falling  above  and  below 
this  threshold.  It  is  here  that  the  binomial  distribution  is  intro- 
duced. Note  that  this  is  essentially  the  same  as  the  sorting 
procedure  used  to  find  a sampled  data  probability  distribution 
function . 


0.95 

0.99 
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The  probability  that  a sample  will  fall  below  a 

A 

threshold  xQ  is  F(xq) , while  the  sorting  procedure  yields  F(xq) . 
Using  the  binomial  distribution,  it  is  possible  to  set  confidence 
intervals  about  F(xo) . Mathematical  details  of  this  process  will 
be  the  subject  of  a forthcoming  memorandum. 

The  results  of  this  method  are  given  in  Figs.  4 through 
17.  Each  figure  includes  curves  for  a given  Nj.  and  several 
confidence  levels.  For  each  level,  confidence  limits  are 

A 

plotted  for  F(x)  in  terms  of  the  true  value  F(x).  The  curves 
may  be  utilized  in  the  following  manner.  For  a value  of  F(x) 
obtained  with  Nj  independent  samples,  choose  the  proper  figure  and 

confidence  level.  Draw  a line  parallel  to  the  abscissa  (F(x)), 

A 

with  ordinate  (F(x)).  This  line  will  intersect  the  proper  curves 
for  the  chosen  level  of  confidence.  The  values  of  F(x)  at  which 

the  straight  line  intersects  the  proper  curves  give  the  interval 

A 

about  F(x)  within  which  F(x)  will  lie  with  the  given  confidence. 
For  Nj  ^ 20,  logarithmic  curves  are  presented  to  give  more 
detail  about  the  tails  of  the  distributions.  Note  the  symmetry 
present  in  the  curves. 
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Fig.  4 CONFIDENCE  INTERVALS  FOR  DISTRIBUTION  FUNCTION 
NUMBER  OF INDEPENDENT  SAMPLES  = 4 
.95  .90  .80 
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Fig.  5 CONFIDENCE  INTERVALS  FOR  DISTRIBUTION  FUNCTION 
NUMBER  OF  INDEPENDENT  SAMPLES  = 10 
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Fig.  6 CONFIDENCE  INTERVALS  FOR  DISTRIBUTION  FUNCTION 


ONE  MINUS  ACTUAL  VALUE  OF  FIX) 


Fig.  7 


CONFIDENCE  INTERVALS  FOR  DISTRIBUTION  FUNCTION 
NUMBER  OF  INDEPENDENT  SAMPLES  = 20 


ESTIMATE  OF  FIX) 


Fig.  8 CONFIDENCE  INTERVALS  FOR  DISTRIBUTION  FUNCTION 
NUMBER  OF  INDEPENDENT  SAMPLES  = 50 
.95  .90  .80 
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Fig.  10  CONFIDENCE  INTERVALS  FOR  DISTRIBUTION  FUNCTION 
NUMBER  OF  INDEPENDENT  SAMPLES  = 100 
.95  ---.90  .80 
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Fig.  12  CONFIDENCE  INTERVALS  FOR  DISTRIBUTION  FUNCTION 
NUMBER  OF  INDEPENDENT  SAMPLES  = 200 


ONE  MINUS  ES 


Fig.  13  CONFIDENCE  INTERVALS  FOR  DISTRIBUTION  FUNCTION 
NUMBER  OF  INDEPENDENT  SAMPLES  = 200 
.95  .90  .80 
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Fig.  14  CONFIDENCE  INTERVALS  FOR  DISTRIBUTION  FUNCTION 
NUMBER  OF  INDEPENDENT  SAMPLES  = 500 


as  .An  .an 


ONE  MINUS  ES 


Fig.  15  CONFIDENCE  INTERVALS  FOR  DISTRIBUTION  FUNCTION 
NUMBER  OF INDEPENDENT  SAMPLES  = 500 
.95  .90  .80 
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Fig.  16  CONFIDENCE  INTERVALS  FOR  DISTRIBUTION  FUNCTION 
NUMBER  OF INDEPENDENT  SAMPLES  = 1000 


ONE  MINUS  ES 


ONE  MINUS  ACTUAL  VALUE  OF  FIX) 


Fig.  17  CONFIDENCE  INTERVALS  FOR  DISTRIBUTION  FUNCTION 
NUMBER  OF  INDEPENDENT  SAMPLES  = 1000 
.95  .90  .80 
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